WHAT IS CLAIMED IS: 

1 . A voice region detection apparatus, comprising: 

a preprocessing unit for dividing an input voice signal into frames; 
5 a whitening unit for combining white noise with the frames input from the 

preprocessing unit; 

a random parameter extraction unit for extracting random parameters indicating 
the randomness of frames from the frames input from the whitening unit; 

a frame state determination unit for classifying the frames into voice frames and 
10 noise frames based on the random parameters extracted by the random parameter extraction 
unit; and 

a voice region detection unit for detecting a voice region by calculating start and 
end positions of a voice based on the voice and noise frames input from the frame state 
determination unit. 
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I 

2 . The apparatus as claimed in claim 1 , wherein the preprocessing unit samples the 
input voice signal according to a predetermined frequency and divides the sampled voice 
signal into a plurality of frames. 

20 3. The apparatus as claimed in claim 2, wherein the plurality of frames overlap with 
one another. 

4. The apparatus as claimed in claim 1 , wherein the whitening unit comprises a white 
noise generation unit for generating the white noise, and a signal synthesizing unit for 

25 combining the frames input from the preprocessing unit with the white noise generated by 
the white noise generation unit. 

5. The apparatus as claimed in claim 1, 2, 3 or 4, wherein the random parameter 
extraction unit calculates the numbers of runs consisting of consecutive identical elements 

30 in the frames subjected to the whitening by the whitening unit and extracts the random 
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parameters based on the calculated numbers of runs. 

6. The apparatus as claimed in claim 5, wherein the random parameter is: 

NR=* 
n 

5 where NR is a random parameter of a frame, n is a half of the length of the frame, and R is 
the number of runs in the frame. 

7. The apparatus as claimed in claim 1 or 6, wherein the voice frames include vocal 
frames and fricative frames. 

10 

8. The apparatus as claimed in claim 7 , wherein the frame state determination unit 
determines that if the random parameter of a frame extracted by the random parameter 
extraction unit is below a first threshold, the relevant frame is a vocal frame. 

15 9. The apparatus as claimed in claim 8, wherein the first threshold is 0.8. 

10. The apparatus as claimed in claim 8, wherein the frame state determination unit 
determines that if the random parameter of a frame extracted by the random parameter 
extraction unit is above a second threshold, the relevant frame is a fricative frame. 

20 

1 1 . The apparatus as claimed in claim 1 0, wherein the second threshold is 1 .2. 

12. The apparatus as claimed in claim 10, wherein the frame state determination unit 
determines that if the random parameter of the frame extracted by the random parameter 

25 extraction unit is above the first threshold and below the second threshold, the relevant 
frame is a noise frame. 

13. The apparatus as claimed in claim 12, wherein the first threshold is 0.8, and the 
second threshold is 1.2. 

30 



16 



14. The apparatus as claimed in claim 1, further comprising a color noise elimination 
unit for eliminating color noise from the voice region detected by the voice region 
detection unit. 

5 15. The apparatus as claimed in claim 10 , further comprising a color noise elimination 
unit for eliminating color noise from the voice region detected by the voice region 
detection unit, wherein the color noise elimination unit eliminates the color noise from the 
detected voice region if the random parameter of the voice region detected by the voice 
region detection unit is below a predetermined threshold. 

10 

16. The apparatus as claimed in claim 15, wherein the predetermined threshold is a 
value obtained by subtracting the amount of reduction in the random parameter due to the 
color noise from the first threshold. 

15 17. The apparatus as claimed in claim 15, wherein the predetermined threshold is a 
value obtained by subtracting the amount of reduction in the random parameter due to the 
color noise from the second threshold. 

18. A voice region detection method, comprising the steps of: 
20 (a) if a voice signal is input, dividing the input voice signal into frames; 

(b) performing whitening of surrounding noise by combining white noise with the 

frames; 

(c) extracting random parameters indicating randomness of frames from the frames 
subjected to the whitening; 

25 (d) classifying the frames into voice frames and noise frames based on the 

extracted random parameters; and 

(e) detecting a voice region by calculating start and end positions of a voice based 
on the voice and noise frames. 

30 19. The method as claimed in claim 18, wherein step (a) comprises the step of 
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sampling the input voice signal according to a predetermined frequency and dividing the 
sampled voice signal into a plurality of frames. 

20. The method as claimed in claim 19, wherein the plurality of frames overlap with 
5 one another. 

* 

21. The method as claimed in claim 18, wherein step (b) comprises the steps of: 
generating the white noise, and 

combining the frames with the generated white noise. 

10 

22 . The method as claimed in claim 18, 19, 20 or 21, wherein step (c) comprises the 
steps of: 

calculating the numbers of runs consisting of consecutive identical elements in the 
frames subjected to the whitening, and 
15 extracting the random parameters by dividing the calculated numbers of runs by 

lengths of the frames. 

23. The method as claimed in claim 22, wherein the random parameter is: 

NR=- 
n 

20 where NR is a random parameter of a frame, n is a half of the length of the frame, and R is 
the number of runs in the frame. 

24. The method as claimed in claim 18 or 23, wherein the voice frames include vocal 
frames and fricative frames. 

25 

25. The method as claimed in claim 24, further comprising the step of determining that 
if the extracted random parameter of the frame is below a first threshold, the relevant frame 
is a vocal frame. 

30 26. The method as claimed in claim 25, wherein the first threshold is 0.8. 
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27. The method as claimed in claim 25, further comprising the step of determining that 
if the extracted random parameter of the frame is above a second threshold, the relevant 
frame is a fricative frame. 

5 

28. The method as claimed in claim 27, wherein the second threshold is 1 .2. 

29. The method as claimed in claim 27, further comprising the step of determining that 
if the extracted random parameter of the frame is above the first threshold and below the 

10 second threshold, the relevant frame is a noise frame. 

30. The method as claimed in claim 29, wherein the first threshold is 0.8, and the 
second threshold is 1 .2. 

15 31. The method as claimed in claim 27, further comprising the step of eliminating the 
color noise from the detected voice region if the random parameter of the voice region 
detected by the voice region detection unit is below a predetermined threshold. 

32. The method as claimed in claim 31, wherein the predetermined threshold is a value 
20 obtained by subtracting the amount of reduction in the random parameter due to the color 

noise from the first threshold. 

33. The method as claimed in claim 3 1 , wherein the predetermined threshold is a value 
obtained by subtracting the amount of reduction in the random parameter due to the color 

25 noise from the second threshold. 
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